Day 16 - Regular expressions - Multiple matches
88
Exercise 16.03
The log file simple.log contains the IP address of the client for each request. IP addresses are made of
four numbers separated by dots (i.e. A.B.C.D), where each number goes from 0 to 255 (thus having
from 1 to 3 digits). Find the 5 IP addresses that occur the highest number of times in the file, counting
them
Solution
$ grep -Eo "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" simple.log | sort | uniq\
-c | sort -nr | head -n 5
482 66.249.73.135
364 46.105.14.53
357 130.237.218.86
273 75.97.9.59
113 50.16.19.13
The regular expressions is made of four repetitions of [0-9]{1,3}, which matches 1 to 3 adjacent
digits, separated by \. which is a literal dot (remember that a dot without the escape backslash
matches any character). The following sort and uniq -c provide the counting, the last sort -nr
orders the list again using the numerical sort (which orders according to the count, as this is at the
beginning of each line), and in reverse order, starting from bigger numbers down to 1. The last head
-n5 at last selects the top 5 from the list.
Go back to the exercise
Exercise 16.04
The file simple.log contains the HTTP method used in the request (for example GET or POST) followed
by a space and the rest of the log line. For each request that uses a GET print the HTTP method and
everything that follows.
Solution